Prospects for building large timetrees using molecular data with incomplete gene coverage among species.
نویسندگان
چکیده
Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness of the species-gene matrix on the accuracy of divergence time estimates. Here, we present results from computer simulations and empirical data analyses to quantify the impact of missing gene data on divergence time estimation in large phylogenies. We found that estimates of divergence times were robust even when sequences from a majority of genes for most of the species were absent. From the analysis of such extremely sparse data sets, we found that the most egregious errors occurred for nodes in the tree that had no common genes for any pair of species in the immediate descendant clades of the node in question. These problematic nodes can be easily detected prior to computational analyses based only on the input sequence alignment and the tree topology. We conclude that it is best to use larger alignments, because adding both genes and species to the alignment augments the number of genes available for estimating divergence events deep in the tree and improves their time estimates.
منابع مشابه
A ricle Prospects for Building Large Timetrees Using Molecular Data with Incomplete Gene Coverage among Species
Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness o...
متن کاملFast and Accurate Estimates of Divergence Times from Big Data.
Ongoing advances in sequencing technology have led to an explosive expansion in the molecular data available for building increasingly larger and more comprehensive timetrees. However, Bayesian relaxed-clock approaches frequently used to infer these timetrees impose a large computational burden and discourage critical assessment of the robustness of inferred times to model assumptions, influenc...
متن کاملData Mining for Identification of Forkhead Box O (FOXO3a) in Different Organisms Using Nucleotide and Tandem Repeat Sequences
Background: Deregulation of FOXO3a gene which belongs to Forkhead box O (FOXO) transcription factors, can cause cancer (e.g. breast cancer). FOXO factors have important role in ubiquitination, acetylation, de-acetylation, protein-protein interactions and phosphorylation. Understanding the regulation and mechanisms of FOXO3a can lead to cancer treatment. The aim of this study recent association...
متن کاملRibulose-1, 5-Bisphosphate Carboxylase/Oxygenase Gene Sequencing in Taxonomic Delineation of Padina Species in theNorthern Coast of the Persian Gulf, (IRAN)
Taxonomic study of the genus Padina (Dictyotales, Phaeophyceae) from the Persian Gulf coast was conducted based on morphology and molecular phylogenetic analyses using chloroplast encoded large subunit RuBisCo (rbcL) gene sequences. Detailed descriptions of each species found in this study are described. Several morphological characters, such as number of cell layers composing the thallus, pr...
متن کاملMorphological and molecular characterization of three new Fusarium species associated with inflorescence of wild grasses for Iran
In order to explore biodiversity of Fusarium species associated with the inflorescences of poaceous weeds, heads and inflorescences were collected from wild grasses in Ardabil province (Iran). Fusarium species were isolated using general and selective media. Pure cultures were established using a single spore technique. The isolates were identified based on morphological and molecular data. Seq...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Molecular biology and evolution
دوره 31 9 شماره
صفحات -
تاریخ انتشار 2014